home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sources.misc
- From: allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc)
- Subject: v06i045: Dissection utility for over-large mbox files
- Reply-To: mirk@warwick.UUCP (Mike Taylor)
- Organization: Computer Science, Warwick University, UK
-
- Posting-number: Volume 6, Issue 45
- Submitted-by: mirk@warwick.UUCP (Mike Taylor)
- Archive-name: dissect2
-
- [Okay, so csplit can't be this tricky. Still, you could do wonders with it
- and a shell script wrapper.... Of course, BSD may not have "csplit". ++bsa]
-
- Here is a simple and self-explainatory little number for comp.sources.misc
- It should run on any UNIX machine, though I've only tried it on a sun3 with
- Berkeley 4.3. It splits a large mbox into individually named personal
- mboxes for each person who has composed one or more of the mbox's
- constituent articles. See the manual page for more details.
-
- -------------------------- Cut here, cheese-heads! --------------------------
- #! /bin/sh
- # This is a shell archive, meaning:
- # 1. Remove everything above the #! /bin/sh line.
- # 2. Save the resulting text in a file.
- # 3. Execute the file with /bin/sh (not csh) to create the files:
- # Makefile
- # Manifest
- # README
- # dissect.1
- # dissect.c
- # This archive created: Sat Jan 21 18:30:57 1989
- # By: Mike Taylor ()
- export PATH; PATH=/bin:$PATH
- if test -f 'Makefile'
- then
- echo shar: will not over-write existing file "'Makefile'"
- else
- cat << \SHAR_EOF > 'Makefile'
- all: dissect.c
- cc -O -s dissect.c -o dissect
- rm -f count
- ln dissect count
-
- dissect: dissect.c
- cc -O -s dissect.c -o dissect
-
- count: dissect.c
- cc -O -s dissect.c -o count
- SHAR_EOF
- fi # end of overwriting check
- if test -f 'Manifest'
- then
- echo shar: will not over-write existing file "'Manifest'"
- else
- cat << \SHAR_EOF > 'Manifest'
- -rw-r--r-- 1 mirk csother 182 Jan 21 18:01 Makefile
- -rw-r--r-- 1 mirk csother 315 Jan 21 18:28 Manifest
- -rw-r--r-- 1 mirk csother 564 Jan 21 18:24 README
- -rw-r--r-- 1 mirk csother 1955 Jan 21 18:21 dissect.1
- -rw-r--r-- 1 mirk csother 3194 Jan 21 17:57 dissect.c
- SHAR_EOF
- fi # end of overwriting check
- if test -f 'README'
- then
- echo shar: will not over-write existing file "'README'"
- else
- cat << \SHAR_EOF > 'README'
- Evening, all.
-
- This is a program written in a hurry by me one night because I was
- sick of wading through 1/4M mailboxes, trying to find some archaic
- piece of correspondance. It breaks up a large mailbox (or several
- of them, if you like) into smaller ones, named after the sender of
- the pieces of mail they contain. See the manual entry if this is
- unclear. Mail bugs, flames, pieces of frozen vomit, slices of
- intestinal lining etc., to mirk@uk.ac.warwick.cs. That's about it
- really. Lap it up!
-
- PS. 1st man: "My dog's got not nose"
- 2nd man: "Frog off."
- SHAR_EOF
- fi # end of overwriting check
- if test -f 'dissect.1'
- then
- echo shar: will not over-write existing file "'dissect.1'"
- else
- cat << \SHAR_EOF > 'dissect.1'
- .\" @(#)dissect.1 1.17 89/01/20 SMI; from HACKERS 1.1
- .TH DISSECT 1 "20 January 1989"
- .SH NAME
- dissect \- Break up an mbox into smaller mboxes
- .br
- count \- Count number of articles in an mbox
- .SH SYNOPSIS
- .B dissect
- .I filename1
- .I [ filename2 ... ]
- .br
- .B count
- .I filename1
- .I [ filename2 ... ]
- .br
- .SH DESCRIPTION
- .B dissect
- reads through one or more files in mbox format (eg. the file mbox created
- by most "mail" programs, and the newsgroup files created by rn(1)). It
- creates new files, each named after the sender of an item of mail in one
- of the specified mboxes, and in that file, deposits copies of all mail
- sent by that user, so that together, the new files contain exactly the
- same data as the old ones. If the files that would be created already
- exist, then
- .B dissect
- will append the news items in the specified mboxes onto the end of the
- existing files.
- .B dissect
- will refuse to overwrite any of its arguments.
- .sp
- .B count
- counts how many articles are in each mbox specified on the command-line,
- and prints this on standard output.
- .SH EXAMPLES
- example% ls
- .br
- mbox
- .br
- example% dissect mbox
- .br
- example% ls
- .br
- VIRUS-L cee074 erict jec1 mbox
- .br
- andy chip.uucp hjt martin weemba
- .br
- example% count mbox martin hjt
- .br
- count: 11 items of mail in input file mbox.
- .br
- count: 1 items of mail in input file martin.
- .br
- count: 1 items of mail in input file hjt.
- .br
- example% dissect hjt
- .br
- dissect: won't overwrite input file hjt.
- .SH "SEE ALSO"
- .BR mail(1),
- .BR rn(1),
- .SH BUGS
- .B dissect
- creates the new files using only the local name of the user who sent
- the mail item being saved - thus a piece of mail sent by a user
- .B mirk@uk.ac.warwick.cs
- would be saved in a file called simply
- .B mirk.
- .SH AUTHOR
- .B dissect
- and
- .B count
- were written by Michael Taylor (mirk@uk.ac.warwick.cs) in the early hours
- of the morning of Friday, 20th January, 1989, on Warwick University's
- Sun3 "emerald".
- SHAR_EOF
- fi # end of overwriting check
- if test -f 'dissect.c'
- then
- echo shar: will not over-write existing file "'dissect.c'"
- else
- cat << \SHAR_EOF > 'dissect.c'
- /****************************************************************************\
- |* *|
- |* Dissect.c: a rough-and-ready heap of junk to split a file in mbox *|
- |* format into a number of mbox-format files, each containing *|
- |* all the messages from a sender whose mail was in the *|
- |* original mbox, and named after that sender. *|
- |* *|
- |* Also: it will count the number of articles in each mbox in its *|
- |* argument list, when called with argv[0] not equal to *|
- |* dissect. *|
- |* *|
- |* This program written in the early hours of 21st January 1989. *|
- |* Copyright (C) 1989 by Mike Taylor. No rights reserved - copy me! *|
- |* *|
- \****************************************************************************/
-
- #include <stdio.h>
- #include <strings.h>
-
- #define LINELEN 1024
-
- extern char *fgets ();
- static int onlycount = 0;
-
- /*--------------------------------------------------------------------------*/
-
- int handle (argv, index)
- char **argv;
- int index;
- {
- FILE *fp;
- FILE *to = NULL;
- static char name[LINELEN];
- static char line[LINELEN];
- static char last[LINELEN] = "\n";
- char *cp;
- int flag = 0;
-
- if ((fp = fopen (argv[index], "r")) == NULL) {
- (void) fprintf (stderr, "%s: couldn't open input file %s.\n",
- argv[0], argv[index]);
- return (1);
- }
-
- while (fgets (line, LINELEN, fp) != NULL) {
- if ((!strncmp (line, "From ", 5)) && (*last == '\n')) {
- flag++;
- if (!onlycount) {
- (void) fclose (to);
- (void) strcpy (name, line+5);
- for (cp = name; (*cp != ' ') && (*cp != '@') && (*cp != '%'); cp++);
- *cp = '\0';
- if (!strcmp (name, argv[index])) {
- (void) fprintf (stderr, "%s: won't overwrite input file %s.\n",
- argv[0], argv[index]);
- continue;
- }
- if ((to = fopen (name, "a")) == NULL) {
- (void) fprintf (stderr, "%s: couldn't open output file %s.\n",
- argv[0], name);
- return (1);
- }
- }
- }
- if ((to != NULL) && (!onlycount))
- (void) fputs (line, to);
- (void) strcpy (last, line);
- }
- if (flag == 0)
- (void) fprintf (stderr, "%s: found no mail in input file %s.\n",
- argv[0], argv[index]);
- else
- if (onlycount)
- (void) printf ("%s: %3d items of mail in input file %s.\n",
- argv[0], flag, argv[index]);
- return (flag == 0);
- }
-
- /*--------------------------------------------------------------------------*/
-
- main (argc, argv)
- int argc;
- char **argv;
- {
- int status = 0;
- int i;
-
- if (argc == 1) {
- (void) fprintf (stderr, "Usage: %s file [ file ... ]\n", argv[0]);
- exit (255);
- }
-
- if (strcmp (argv[0], "dissect"))
- onlycount = 1;
-
- for (i = 1; i < argc; i++)
- status += handle (argv, i);
-
- exit (status);
- }
-
- /*--------------------------------------------------------------------------*/
- SHAR_EOF
- fi # end of overwriting check
- # End of shell archive
- exit 0
- ______________________________________________________________________________
- Mike Taylor - {Christ,M{athemat,us}ic}ian ... Email to: mirk@uk.ac.warwick.cs
- *** Unkle Mirk sez: "Em9 A7 Em9 A7 Em9 A7 Em9 A7 Cmaj7 Bm7 Am7 G Gdim7 Am" ***
- ------------------------------------------------------------------------------
-